Disease-associated microbiome signature species in the gut

Abstract There is an accumulation of evidence that the human gut microbiota plays a role in maintaining health, and that an altered gut microbiota (sometimes called dysbiosis) associates with risk for many noncommunicable diseases. However, the dynamics of disease-linked bacteria in the gut and other body sites remain unclear. If microbiome alterations prove causative in particular diseases, therapeutic intervention may be possible. Furthermore, microbial signature taxa have been established for the diagnosis of some diseases like colon cancer. We identified 163 disease-enriched and 98 disease-depleted gut microbiome signature taxa at species-level resolution (signature species) from 10 meta-analyses of multiple diseases such as colorectal cancer, ulcerative colitis, Crohn's disease, irritable bowel syndrome, pancreatic cancer, and COVID-19 infection. Eight signature species were enriched and nine were depleted across at least half of the diseases studied. Compared with signature species depleted in diseases, a significantly higher proportion of disease-enriched signature species were identified as extra-intestinal (primarily oral) inhabitants, had been reported in bacteremia cases from the literature, and were aerotolerant anaerobes. These findings highlight the potential involvement of oral microbes, bacteremia isolates, and aerotolerant anaerobes in disease-associated gut microbiome alterations, and they have implications for patient care and disease management.


Introduction
Two decades of research have identified many significant associations between altered configurations of the human gut microbiome and diverse noncommunicable diseases (1).Microbial signatures in the gut that are consistently altered across disease cohorts, referred to here as signature taxa, have been observed in specific diseases (2)(3)(4)(5).The gut microbiome has been hypothesized to act as an environmental factor in the development of multifactorial diseases and has therefore emerged as a potential therapeutic target.However, it is unclear whether microbiome alterations cause disease or vice versa, or if both are caused by a third factor (6). Understanding the mechanisms that drive microbiome alterations remains a central goal in human microbiome research.Growing evidence supports the notion that the translocation of oral microbes to the gut may contribute to the pathogenesis of chronic diseases (7)(8)(9)(10)(11), particularly periodontitisrelated oral microbes which may induce dysbiosis in the gut microbiota following their introduction by swallowing (12) or from the blood circulatory system (13).In healthy subjects, microbes may translocate from other body sites into the bloodstream during daily activities such as conducting oral hygiene, following minor medical procedures, or skin injuries (14)(15)(16).Bacteremia can progress to a bloodstream infection when the immune response mechanisms are compromised or overwhelmed and might lead to more serious complications (16).Bacteremia is associated with an increased risk of a subsequent diagnosis of colorectal cancer (CRC) (17,18).Additionally, hospitalized patients with inflammatory bowel disease (IBD) have a higher risk of bacteremia compared with patients without IBD (19).
Understanding the link between the altered gut microbiome, particularly microbial signature taxa, and host effect is crucial for understanding human health and disease.Recent metastudies have identified reproducible disease-specific microbial signatures across cohorts and populations (2)(3)(4), commonly characterized by the depletion of commensals associated with healthy subjects and enrichment of disease-associated pathobionts (20).Two meta-analyses reported disease-specific and overlapping gut microbial signatures at the genus-level across different diseases (5,21).However, genus-level microbial signatures are not sufficiently robust, because species in the same genus can show contrasting phenotypes (22).High-resolution microbial signatures allow for the exploration of microorganism characteristics (e.g.body site niche, capacity for bloodstream infections, and aerotolerance), which are valuable for understanding disease pathogenesis and for developing microbiome-targeted therapies.A meta-analysis identified common and disease-specific microbiome signature species between CRC and IBD; however, it has focused on developing predictive models for disease diagnosis (23).To advance our understanding of the potential gateways whereby bacteria gain access to the gut, we identified robust microbial signature species from 10 meta-analyses of the gut microbiome associated with different diseases.

Results and discussion
We compiled a total of 273 unique gut microbial signature species associated with multiple diseases (MD), including CRC, IBD which includes ulcerative colitis (UC) and Crohn's disease (CD), irritable bowel syndrome (IBS), pancreatic cancer (PC), COVID-19, and MD (which was treated here as one disease) (Table S1).We investigated the characteristics of these signature species, namely whether they had been reported in the literature as detected in bloodstream infections (hereinafter referred to as bacteremia), their aerotolerance, their typical body site/niche, and we investigated the proportional distribution of these characteristics between disease-enriched and disease-depleted signature species.Out of the 273 signature species, 163 are enriched in at least one disease, 98 are depleted in at least one disease and 12 show inconsistent abundance directions across diseases (Fig. 1).Among the 163 signature species enriched in diseases, 105 (64.4%) have been reported in cases of bacteremia, 65 (39.9%) are aerotolerant (i.e.facultative/aerotolerant anaerobe or aerobe), 68 (41.7%) are normal extra-intestinal inhabitants, and 51 are normal inhabitants in the oral cavity (Table 1).Notably, 52 signature species are enriched in MD (i.e.≥2 diseases), and 84.6% of these 52 species are detected in bacteremia, which is significantly higher than the proportion of bacteremia isolates enriched in one single disease (55.0%,P = 2.0e−04, Fisher's exact test, Table 1).Of the signature species that are enriched in MD and bacteremia, 19 are oral inhabitants, 19 are gut inhabitants, and one is a normal vaginal inhabitant, suggesting the origin of bacteremia may be gut or oral inhabitants.
Among these multiple disease-associated signature species, Fusobacterium nucleatum and Gemella morbillorum, which are oral inhabitants, and Bacteroides fragilis (a gut inhabitant) are enriched in CRC, and patients who survived bloodstream infections caused by any of these three signature species had a higher risk of subsequent CRC compared with patients without bloodstream infections (17).Among the 98 disease-depleted signature species, only 16 (16.3%)are reported in bacteremia, 9 (9.2%) are aerotolerant, 7 (7.1%) are extra-intestinal inhabitants, and 2 are inhabitants of the oral cavity (Table 1).34 signature species are depleted in MD, with 14.7% of these being reported in cases of bacteremia.This proportion is lower than the proportion depleted in a single disease (17.2%, P = 1, Fisher's exact test, Table 1), which contrasts with the corresponding comparison for disease-enriched signature taxa.This suggests that gut microbial signature taxa enriched in diseases, particularly MD, tend to be detected in bacteremia that may originate from either the oral cavity or gut.
Thirdly, the proportion of disease-enriched signature species that are aerotolerant is significantly higher than that of depleted signature species (OR = 6.5, P = 3.7e−08, Fisher's exact test, Table 1), irrespective of the number of associated diseases (1 disease: P = 3.1e−04; ≥2 diseases: P = 4.6e−07), suggesting that disrupted gut redox homeostasis may facilitate the overgrowth of aerotolerant/facultative anaerobes and may be involved in pathogenesis.Alternatively, intestinal disease may create a more oxygen-rich environment which selects aerotolerant signature species.It is, however, conceivable that some other phenotypic properties that were not investigated could have affected the results.A significantly higher proportion of aerotolerant taxa was also observed among signature species that are detected in bacteremia (P = 2.3e−03) or extra-intestinal inhabitants (P = 3.8e −02), but not in signature taxa that are nonbacteremia isolates (P = 4.1e−01) or intestinal inhabitants (P = 1.4e−01).Patients with numerous diseases including IBD or CRC have been found to have an elevated abundance of facultative anaerobes in their feces (10,24).There are a number of proposed or established methods by which diet may modulate host ability to regulate the availability of respiratory electron acceptors in the intestine such as oxygen and nitrate, which could contribute to modulating the abundance of aerotolerant anaerobes by controlling the luminal environment (24,25).
Fourth, among disease-enriched signature species, the proportion of aerotolerant signature species detected in bacteremia is much higher than that of nonaerotolerant signature species (OR = 7.2, P = 3.3e−07, Fisher's exact test, Table 1).However, no significant difference associated with aerotolerance was found for disease-depleted signature species (P = 6.5e−01).Likewise, the proportion of signature taxa that are normal extra-intestinal inhabitants (or normal oral inhabitants) is significantly higher than that of signature species that are normal intestinal inhabitants among signature species enriched in disease but not depleted in disease (enriched: P < 0.001; depleted: P > 0.2).
Furthermore, we sought to determine whether there were any phylogenetic patterns in these signature species.Remarkably, among the signature species in the Bacilli class, 35 are enriched in disease, three (Lactobacillus rogosae, Ligilactobacillus ruminis, and Weissella confuse) are depleted in disease, and Enterococcus faecium shows inconsistent results between CRC and CD.Although there are numerous well-known probiotic species and strains within the representative Lactobacillus genus of the Bacilli class, only one species, L. rogosae within the Lactobacillus genus is identified as a signature species depleted in disease.On the other hand, two Lactobacillus spp.(L.gasseri and L. delbrueckii) are identified as signature taxa enriched in disease (Fig. 1).In addition, signature species within the Peptoniphilaceae family and the Enterocloster, Fusobacterium, and Veillonella genera are consistently enriched in diseases, while signature species from the Coprococcus and Roseburia genera are consistently depleted in diseases (Fig. 1).Intriguingly, multiple species in the genera Anaerostipes, Bacteroides, Bifidobacterium, Blautia, and Ruminococcus have been  S2.
Given that four meta-studies associated with CRC (2, 29-31) were included in this work, we further investigated the reproducibility of the microbial signature species identified in the four ) in two meta-studies, 3 (5.5%) in three metastudies, and only 4 (7.3%) were identified across all four meta-studies (Table S2).These results indicate that there is a low level of signature robustness between the CRC meta-studies.Despite the heterogeneity of the metagenomic sequencing cohorts with regard to laboratory methodology, DNA sequencing, and geographic region, this is less likely to be the primary factor contributing to the low comparability between meta-studies, given that the majority of the cohorts included in each meta-study have been included in other CRC meta-studies as well (https:// github.com/junhuili/gut_biomarkers/blob/main/supplementary_information.xlsx).The bioinformatics pipelines and methodologies (e.g.covariate adjustment, differential abundance analysis, mixed-effect model, random effects meta-analysis models, machine learning model) employed for the identification of signature species across cohorts may be a significant contributing factor to the discrepancy between the CRC meta-studies.Our study highlights the critical necessity for a meta-meta-analysis wherein all data would be processed in a highly standardized way.This study was limited by compiling microbial signature species from meta-studies that included heterogeneous cohorts with regard to the lab methodology and DNA sequencing, particularly bioinformatics pipelines and methodologies employed for identifying microbial signature species.One limitation of relying on literature-reported bloodstream infections for microbial signature taxa is that the reported isolation does not necessarily infer causative bloodstream infections in the patients.Rather, it simply indicates the signature taxon was detected as present in the blood in certain disease cases.Another limitation is that certain signature species associated with multiple habitats or with unknown aerotolerance were excluded in the respective statistical analyses.A further limitation in the potential usage of our study findings for the prevention and treatment of microbiome-related diseases is the lack of longitudinal characterization within each patient, which hinders our ability to determine the variability in presentation of a signature species.Furthermore, a significant challenge is the lack of quantitative microbiome profiling, which limits the ability to elucidate the interplay between microbiome features and host health.Further metaanalysis is warranted when such data become available to evaluate the variation of signature species.
In summary, our findings reveal that a greater proportion of gut microbial signature taxa enriched in diseases are typically extra-intestinal (primarily oral) inhabitants, are detected in bloodstream infections as reported in the literature, and are aerotolerant, compared with signature taxa depleted in diseases.The emerging picture further supports the notion [reviewed in O'Toole et al. (27)] that establishment of oral inhabitants in the gut may potentially contribute to microbiome alterations and associated diseases, and highlights the importance of evaluating the gut as a source of bloodstream infections in clinical practice.A holistic understanding of within-subject microbiome dynamics may lead to opportunities for prospective risk reduction of noncommunicable microbiome-related diseases.

Criteria for defining the body site, bacteremia, and aerotolerance of microbial signature species
The modified criteria for defining oral bacteria (8) were used to determine the body site of microbial signature species.We calculated the frequency and the average relative abundance of signature species across four human body sites (gut, oral cavity, vagina, and skin) in the NIH Human Microbiome Project (HMP1) using HMSMCP2-metagenomic taxon abundances (https:// www.hmpdacc.org/hmsmcp2/),and the body site that had the highest frequency and average relative abundance of the signature species was designated as the body site for the signature species.For signature species meeting any of the following conditions: a highest frequency of less than 5% across four body sites, either the highest frequency or the highest average relative abundance (but not both), a gut/oral cavity ratio between 0.1 and 10, and not detected in the HMSMCP2, further considerations were taken into account, including information concerning the signature species in the expanded Human Oral Microbiome Database v3.1 (36), the frequency and the average relative abundance of signature species in the gut shotgun metagenomes of healthy individuals in ExperimentHub (n = 12,788) (37) and American Gut Project (n = 229) (38), and/or the isolation source of signature species from NCBI genomes (https://github.com/junhuili/gut_biomarkers/blob/main/supplementary_information.xlsx) and literature references.See Table S2 for detailed information of the normal body site for each signature taxon.
To determine if the signature species are detected in cases of bacteremia, we searched several databases including MEDLINE, PubMed, Scopus, Web of Science, and Google Scholar using the terms "signature taxon name" AND "bacteremia" and "signature taxon name" AND "bloodstream infection" in September 2023.The search included both current and former names of the signature species, with or without a genus abbreviation.If a patient has positive blood cultures associated with a certain disease, infection, or symptom, the signature species was considered as bloodstream infection (or bacteremia).Additionally, we conducted a thorough search of these databases to investigate the aerotolerance of signature species: aerotolerant (facultative/aerotolerant anaerobe, aerobe) or nonaerotolerant (obligate anaerobe).The aerotolerance of five signature taxa, Anaeromassilibacillus sp.An250, Lachnoclostridium sp.An138, Firmicutes bacterium CAG 170, F. bacterium CAG 238, and F. bacterium CAG 94, is unknown.

Statistical analysis
Fisher's exact test was applied to evaluate the significant difference in feature proportions between different conditions.Signature species associated with multiple body sites or without known aerotolerance levels were not included in the statistical analyses.The phylogenetic tree, which includes all signature species, was obtained from NCBI common tree and visualized using iTOL v6 (39).

Fig. 1 .
Fig.1.Properties of 273 disease-associated microbial signature taxa based on data extracted from 10 meta-analysis studies.Inner circle: species with names in red text indicate they are enriched in at least two diseases; species names in blue text indicate those depleted in at least two diseases; species names in brown text indicate inconsistent signature taxon across diseases.Solid star in black after taxon name indicates the species was detected in the bloodstream; solid circle at the end of node branch indicates aerobe; solid triangle in black at the end of node branch indicates facultative/aerotolerant anaerobe or; hollow triangle at the end of node branch indicates obligate anaerobe; solid triangle in gray at the end of node branch indicates unclassified anaerobe.Middle circle: Red square indicates signature taxon enriched in disease, and blue square indicates signature taxon depleted in disease.CRC, colorectal cancer; CD, Crohn's disease; UC, ulcerative colitis; IBS, irritable bowel syndrome; PC, pancreatic cancer.Outer circle: shape and color indicate typical body site/niche of signature species, with hollow black rectangle for gut, hollow blue circle for oral cavity, hollow green triangle for vagina, hollow purple star for skin, solid black circle for food origin, and solid black rectangle for other.The presence of multiple shapes for a signature species indicates that the signature species is associated with multiple body sites.Data used in Fig.1can be found in TableS2.

Table 1 .
Proportion of enriched and depleted signature species as a function of number of associated diseases number, body site, and aerotolerance.